Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCP/CORE: Fix memory cache lookup return value handling #7737

Merged
merged 1 commit into from
Dec 5, 2021

Conversation

dmitrygx
Copy link
Member

@dmitrygx dmitrygx commented Dec 1, 2021

What

Fix memory cache lookup return value handling.

Why ?

The return value could be either UCS_OK or UCS_ERR_NO_ELEM.

How ?

  1. Re-organize the code of branches.
  2. Add useful assertions.
  3. Remove incorrect assertion.
  4. Remove extra brackets around a condition.

}
if (ucs_likely(status == UCS_ERR_NO_ELEM)) {
goto out_host_mem;
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove else?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed and force-pushed (the PR is small enough)

@yosefe
Copy link
Contributor

yosefe commented Dec 1, 2021

@dmitrygx the problem was extra detect overhead when not needed?


if ((mem_info->type != UCS_MEMORY_TYPE_UNKNOWN) &&
(mem_info->sys_dev != UCS_SYS_DEVICE_ID_UNKNOWN)) {
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe move line 517 inside this if and inverse it's condition

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe remove this return?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

* memory domains. In any case, the memory type cache is not expected to
* return HOST memory type.
*/
ucs_assert(mem_info->type != UCS_MEMORY_TYPE_HOST);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe move this assert to line 505

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, done

@dmitrygx
Copy link
Member Author

dmitrygx commented Dec 1, 2021

@dmitrygx the problem was extra detect overhead when not needed?

yes, if it returns UCS_ERR_NO_ELEM, it is always HOST memory - since we don't add HOST memory buffers to memtype cache.
it moves the behavior to what we have introduced in a29b2f7 (PR #6180)

also, it fixes incorrect assertion:

if (status != UCS_ERR_NO_ELEM) {
        if (ucs_likely(status != UCS_OK)) {
            ucs_assert(status == UCS_ERR_NO_ELEM); // <---- this one, it was checked above
            goto out_host_mem;
        }
...
}


if ((mem_info->type != UCS_MEMORY_TYPE_UNKNOWN) &&
(mem_info->sys_dev != UCS_SYS_DEVICE_ID_UNKNOWN)) {
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe remove this return?

goto out_host_mem;
}

ucs_assertv((status == UCS_OK) || (status == UCS_ERR_UNSUPPORTED),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would we get UCS_ERR_UNSUPPORTED?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated PR to return UCS_ERR_UNSUPPORTED in case of MEMTYPE is disabled to distinguish between no element (HOST mem buffer) or always use memory detection (slowpath)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the advantage? it adds another branch (line 507)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the advantage? it adds another branch (line 507)

it fixes the problem with memory type detection if MEMTYPE cache wasn't enabled - we should go to memory detection through MDs. without this check - we assumed that memory type is HOST.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting...
maybe we should set mem_info->type=UNKNOWN in this case?

Copy link
Contributor

@yosefe yosefe Dec 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it would be weird to set mem_info in case of error return,
so guess better restore an equivalent of the prev logic where we checked if memtype_cache was enabled in the first place. keep it UNSUPPORTED

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so guess better restore an equivalent of the prev logic where we checked if memtype_cache was enabled in the first place. keep it UNSUPPORTED

do you mean checking status == UCS_ERR_UNSUPPORTED and doing memory detection in this case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if ((status == UCS_ERR_UNSUPPORTED) ||
(mem_info->type == UCS_MEMORY_TYPE_UNKNOWN) ||
(mem_info->sys_dev == UCS_SYS_DEVICE_ID_UNKNOWN)) {
if (ucs_unlikely((status == UCS_ERR_UNSUPPORTED) ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. pls simplify, i think we can avoid "goto" now
  2. host memory is the fast path and should be checked first

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

host memory is the fast path and should be checked first

done

pls simplify, i think we can avoid "goto" now

can't remove goto - two places use it - and can't merge them into single if statement, because context->num_mem_type_detect_mds == 0 must be checked before calling ucs_memtype_cache_lookup()

@dmitrygx dmitrygx merged commit 3818f46 into openucx:master Dec 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants